Simplified amino acid alphabets for protein fold recognition and implications for folding.
نویسندگان
چکیده
Protein design experiments have shown that the use of specific subsets of amino acids can produce foldable proteins. This prompts the question of whether there is a minimal amino acid alphabet which could be used to fold all proteins. In this work we make an analogy between sequence patterns which produce foldable sequences and those which make it possible to detect structural homologs by aligning sequences, and use it to suggest the possible size of such a reduced alphabet. We estimate that reduced alphabets containing 10-12 letters can be used to design foldable sequences for a large number of protein families. This estimate is based on the observation that there is little loss of the information necessary to pick out structural homologs in a clustered protein sequence database when a suitable reduction of the amino acid alphabet from 20 to 10 letters is made, but that this information is rapidly degraded when further reductions in the alphabet are made.
منابع مشابه
Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices
MOTIVATION Protein and DNA are generally represented by sequences of letters. In a number of circumstances simplified alphabets (where one or more letters would be represented by the same symbol) have proved their potential utility in several fields of bioinformatics including searching for patterns occurring at an unexpected rate, studying protein folding and finding consensus sequences in mul...
متن کاملStructural Characteristics of Stable Folding Intermediates of Yeast Iso-1-Cytochrome-c
Cytochrome-c (cyt-c) is an electron transport protein, and it is present throughout the evolution. More than 280 sequences have been reported in the protein sequence database (www.uniprot.org). Though sequentially diverse, cyt-c has essentially retained its tertiary structure or fold. Thus a vast data set of varied sequences with retention of similar structure and fun...
متن کاملNeutral networks in protein space: a computational study based on knowledge-based potentials of mean force.
BACKGROUND Many protein sequences, often unrelated, adopt similar folds. Sequences folding into the same shape thus form subsets of sequence space. The shape and the connectivity of these sets have implications for protein evolution and de novo design. RESULTS We investigate the topology of these sets for some proteins with known three-dimensional structure using inverse folding techniques. F...
متن کاملSurveying determinants of protein structure designability across different energy models and amino-acid alphabets: A consensus
A variety of analytical and computational models have been proposed to answer the question of why some protein structures are more ‘‘designable’’ ~i.e., have more sequences folding into them! than others. One class of analytical and statistical-mechanical models has approached the designability problem from a thermodynamic viewpoint. These models highlighted specific structural features importa...
متن کاملEvaluation of local structure alphabets based on residue burial.
Residue burial, which describes a protein residue's exposure to solvent and neighboring atoms, is key to protein structure prediction, modeling, and analysis. We assessed 21 alphabets representing residue burial, according to their predictability from amino acid sequence, conservation in structural alignments, and utility in one fold-recognition scenario. This follows upon our previous work in ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Protein engineering
دوره 13 3 شماره
صفحات -
تاریخ انتشار 2000